8 research outputs found

    Profit Maximizing Logistic Regression Modeling for Credit Scoring

    Get PDF
    Multiple classification techniques have been employed for different business applications. In the particular case of credit scoring, a classifier which maximizes the total profit is preferable. The recently proposed expected maximum profit (EMP) measure for credit scoring allows to select the most profitable classifier. Taking the idea of the EMP one step further, it is desirable to integrate the measure into model construction, and thus obtain a profit maximizing model. Therefore, in this work we propose a method based on the ProfLogit classifier, which optimizes the coefficients of a logistic regression model using a genetic algorithm. The proposed implemented technique shows a significant improvement compared to regular maximum likelihood based logistic regression models on real-life data sets in terms of total profit, which is the ultimate goal for most businesses.</p

    Isolation-based conditional anomaly detection on mixed-attribute data to uncover workers’ compensation fraud

    No full text
    The development of new data analytical methods remains a crucial factor in the combat against insurance fraud. Methods rooted in the research field of anomaly detection are considered as promising candidates for this purpose. Commonly, a fraud data set contains both numeric and nominal attributes, where, due to the ease of expressiveness, the latter often encodes valuable expert knowledge. For this reason, an anomaly detection method should be able to handle a mixture of different data types, returning an anomaly score meaningful in the context of the business application. We propose the iForestCAD approach that computes conditional anomaly scores, useful for fraud detection. More specifically, anomaly detection is performed conditionally on well-defined data partitions that are created on the basis of selected numeric attributes and distinct combinations of values of selected nominal attributes. In this way, the resulting anomaly scores are computed with respect to a reference group of interest, thus representing a meaningful score for domain experts. Given that anomaly detection is performed conditionally, this approach allows detecting anomalies that would otherwise remain undiscovered in unconditional anomaly detection. Moreover, we present a case study in which we demonstrate the usefulness of our proposed approach on real-world workers’ compensation claims received from a large European insurance organization. As a result, the iForestCAD approach is greatly accepted by domain experts for its effective detection of fraudulent claims.</p

    Profit maximizing logistic model for customer churn prediction using genetic algorithms

    No full text
    To detect churners in a vast customer base, as is the case with telephone service providers, companies heavily rely on predictive churn models to remain competitive in a saturated market. In previous work, the expected maximum profit measure for customer churn (EMPC) has been proposed in order to determine the most profitable churn model. However, profit concerns are not directly integrated into the model construction. Therefore, we present a classifier, named ProfLogit, that maximizes the EMPC in the training step using a genetic algorithm, where ProfLogit’s interior model structure resembles a lasso-regularized logistic model. Additionally, we introduce threshold-independent recall and precision measures based on the expected profit maximizing fraction, which is derived from the EMPC framework. Our proposed technique aims to construct profitable churn models for retention campaigns to satisfy the business requirement of profit maximization. In a benchmark study with nine real-life data sets, ProfLogit exhibits the overall highest, out-of-sample EMPC performance as well as the overall best, profit-based precision and recall values. As a result of the lasso resemblance, ProfLogit also performs a profit-based feature selection in which features are selected that would otherwise be excluded with an accuracy-based measure, which is another noteworthy finding.status: publishe

    Profit Driven Decision Trees for Churn Prediction

    No full text
    status: publishe

    Profit maximizing logistic regression modeling for customer churn prediction

    No full text
    The selection of classifiers which are profitable is becoming more and more important in real-life situations such as customer churn management campaigns in the telecommunication sector. In previous works, the expected maximum profit (EMP) metric has been proposed, which explicitly takes the cost of offer and the customer lifetime value (CLV) of retained customers into account. It thus permits the selection of the most profitable classifier, which better aligns with business requirements of end-users and stake holders. However, modelers are currently limited to applying this metric in the evaluation step. Hence, we expand on the previous body of work and introduce a classifier that incorporates the EMP metric in the construction of a classification model. Our technique, called ProfLogit, explicitly takes profit maximization concerns into account during the training step, rather than the evaluation step. The technique is based on a logistic regression model which is trained using a genetic algorithm (GA). By means of an empirical benchmark study applied to real-life data sets, we show that ProfLogit generates substantial profit improvements compared to the classic logistic model for many data sets. In addition, profit-maximized coefficient estimates differ considerably in magnitude from the maximum likelihood estimates.status: publishe

    Profit maximizing logistic model for customer churn prediction using genetic algorithms

    No full text
    To detect churners in a vast customer base, as is the case with telephone service providers, companies heavily rely on predictive churn models to remain competitive in a saturated market. In previous work, the expected maximum profit measure for customer churn (EMPC) has been proposed in order to determine the most profitable churn model. However, profit concerns are not directly integrated into the model construction. Therefore, we present a classifier, named ProfLogit, that maximizes the EMPC in the training step using a genetic algorithm, where ProfLogit's interior model structure resembles a lasso-regularized logistic model. Additionally, we introduce threshold-independent recall and precision measures based on the expected profit maximizing fraction, which is derived from the EMPC framework. Our proposed technique aims to construct profitable churn models for retention campaigns to satisfy the business requirement of profit maximization. In a benchmark study with nine real-life data sets, ProfLogit exhibits the overall highest, out-of-sample EMPC performance as well as the overall best, profit-based precision and recall values. As a result of the lasso resemblance, ProfLogit also performs a profit-based feature selection in which features are selected that would otherwise be excluded with an accuracy-based measure, which is another noteworthy finding.</p

    Profit driven decision trees for churn prediction

    No full text
    Customer retention campaigns increasingly rely on predictive models to detect potential churners in a vast customer base. From the perspective of machine learning, the task of predicting customer churn can be presented as a binary classification problem. Using data on historic behavior, classification algorithms are built with the purpose of accurately predicting the probability of a customer defecting. The predictive churn models are then commonly selected based on accuracy related performance measures such as the area under the ROC curve (AUC). However, these models are often not well aligned with the core business requirement of profit maximization, in the sense that, the models fail to take into account not only misclassification costs, but also the benefits originating from a correct classification. Therefore, the aim is to construct churn prediction models that are profitable and preferably interpretable too. The recently developed expected maximum profit measure for customer churn (EMPC) has been proposed in order to select the most profitable churn model. We present a new classifier that integrates the EMPC metric directly into the model construction. Our technique, called ProfTree, uses an evolutionary algorithm for learning profit driven decision trees. In a benchmark study with real-life datasets from various telecommunication service providers, we show that ProfTree achieves significant profit improvements compared to classic accuracy driven tree-based methods.</p
    corecore